Overview

Dataset statistics

Number of variables13
Number of observations29101
Missing cells3043
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.9 MiB
Average record size in memory104.0 B

Variable types

Categorical2
Numeric10
Boolean1

Warnings

pickup_dt has a high cardinality: 4343 distinct values High cardinality
temp is highly correlated with dewp and 1 other fieldsHigh correlation
dewp is highly correlated with tempHigh correlation
sd is highly correlated with tempHigh correlation
vsb is highly correlated with pcp01High correlation
temp is highly correlated with dewp and 1 other fieldsHigh correlation
dewp is highly correlated with temp and 1 other fieldsHigh correlation
pcp01 is highly correlated with vsbHigh correlation
sd is highly correlated with temp and 1 other fieldsHigh correlation
vsb is highly correlated with pcp01High correlation
temp is highly correlated with dewpHigh correlation
dewp is highly correlated with tempHigh correlation
pcp01 is highly correlated with vsbHigh correlation
pcp01 is highly correlated with vsbHigh correlation
sd is highly correlated with dewp and 2 other fieldsHigh correlation
vsb is highly correlated with pcp01High correlation
borough is highly correlated with pickupsHigh correlation
dewp is highly correlated with sd and 2 other fieldsHigh correlation
pickups is highly correlated with boroughHigh correlation
temp is highly correlated with sd and 1 other fieldsHigh correlation
slp is highly correlated with sd and 1 other fieldsHigh correlation
borough has 3043 (10.5%) missing values Missing
pickup_dt is uniformly distributed Uniform
borough is uniformly distributed Uniform
pickups has 5567 (19.1%) zeros Zeros
spd has 3596 (12.4%) zeros Zeros
dewp has 303 (1.0%) zeros Zeros
pcp01 has 26468 (91.0%) zeros Zeros
pcp06 has 23460 (80.6%) zeros Zeros
pcp24 has 18631 (64.0%) zeros Zeros
sd has 20167 (69.3%) zeros Zeros

Reproduction

Analysis started2021-08-14 17:30:19.448068
Analysis finished2021-08-14 17:30:42.636455
Duration23.19 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

pickup_dt
Categorical

HIGH CARDINALITY
UNIFORM

Distinct4343
Distinct (%)14.9%
Missing0
Missing (%)0.0%
Memory size227.5 KiB
2015-02-15 16:00:00
 
7
2015-06-11 04:00:00
 
7
2015-04-20 21:00:00
 
7
2015-01-23 01:00:00
 
7
2015-05-31 16:00:00
 
7
Other values (4338)
29066 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters552919
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015-01-01 01:00:00
2nd row2015-01-01 01:00:00
3rd row2015-01-01 01:00:00
4th row2015-01-01 01:00:00
5th row2015-01-01 01:00:00

Common Values

ValueCountFrequency (%)
2015-02-15 16:00:007
 
< 0.1%
2015-06-11 04:00:007
 
< 0.1%
2015-04-20 21:00:007
 
< 0.1%
2015-01-23 01:00:007
 
< 0.1%
2015-05-31 16:00:007
 
< 0.1%
2015-03-17 06:00:007
 
< 0.1%
2015-06-22 11:00:007
 
< 0.1%
2015-05-06 09:00:007
 
< 0.1%
2015-02-27 14:00:007
 
< 0.1%
2015-06-12 21:00:007
 
< 0.1%
Other values (4333)29031
99.8%

Length

2021-08-14T13:30:42.968233image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
22:00:001237
 
2.1%
09:00:001236
 
2.1%
19:00:001233
 
2.1%
20:00:001233
 
2.1%
07:00:001232
 
2.1%
21:00:001230
 
2.1%
08:00:001229
 
2.1%
17:00:001228
 
2.1%
16:00:001226
 
2.1%
10:00:001225
 
2.1%
Other values (195)45893
78.9%

Most occurring characters

ValueCountFrequency (%)
0201596
36.5%
162876
 
11.4%
-58202
 
10.5%
:58202
 
10.5%
254544
 
9.9%
539446
 
7.1%
29101
 
5.3%
312735
 
2.3%
610236
 
1.9%
410059
 
1.8%
Other values (3)15922
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number407414
73.7%
Dash Punctuation58202
 
10.5%
Other Punctuation58202
 
10.5%
Space Separator29101
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0201596
49.5%
162876
 
15.4%
254544
 
13.4%
539446
 
9.7%
312735
 
3.1%
610236
 
2.5%
410059
 
2.5%
75358
 
1.3%
85357
 
1.3%
95207
 
1.3%
Dash Punctuation
ValueCountFrequency (%)
-58202
100.0%
Space Separator
ValueCountFrequency (%)
29101
100.0%
Other Punctuation
ValueCountFrequency (%)
:58202
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common552919
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0201596
36.5%
162876
 
11.4%
-58202
 
10.5%
:58202
 
10.5%
254544
 
9.9%
539446
 
7.1%
29101
 
5.3%
312735
 
2.3%
610236
 
1.9%
410059
 
1.8%
Other values (3)15922
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII552919
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0201596
36.5%
162876
 
11.4%
-58202
 
10.5%
:58202
 
10.5%
254544
 
9.9%
539446
 
7.1%
29101
 
5.3%
312735
 
2.3%
610236
 
1.9%
410059
 
1.8%
Other values (3)15922
 
2.9%

borough
Categorical

HIGH CORRELATION
MISSING
UNIFORM

Distinct6
Distinct (%)< 0.1%
Missing3043
Missing (%)10.5%
Memory size227.5 KiB
EWR
4343 
Staten Island
4343 
Bronx
4343 
Brooklyn
4343 
Manhattan
4343 

Length

Max length13
Median length7
Mean length7.333333333
Min length3

Characters and Unicode

Total characters191092
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBronx
2nd rowBrooklyn
3rd rowEWR
4th rowManhattan
5th rowQueens

Common Values

ValueCountFrequency (%)
EWR4343
14.9%
Staten Island4343
14.9%
Bronx4343
14.9%
Brooklyn4343
14.9%
Manhattan4343
14.9%
Queens4343
14.9%
(Missing)3043
10.5%

Length

2021-08-14T13:30:43.369539image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-14T13:30:43.485755image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
island4343
14.3%
queens4343
14.3%
brooklyn4343
14.3%
staten4343
14.3%
ewr4343
14.3%
bronx4343
14.3%
manhattan4343
14.3%

Most occurring characters

ValueCountFrequency (%)
n30401
15.9%
a21715
 
11.4%
t17372
 
9.1%
o13029
 
6.8%
e13029
 
6.8%
B8686
 
4.5%
r8686
 
4.5%
l8686
 
4.5%
s8686
 
4.5%
x4343
 
2.3%
Other values (13)56459
29.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter147662
77.3%
Uppercase Letter39087
 
20.5%
Space Separator4343
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n30401
20.6%
a21715
14.7%
t17372
11.8%
o13029
8.8%
e13029
8.8%
r8686
 
5.9%
l8686
 
5.9%
s8686
 
5.9%
x4343
 
2.9%
k4343
 
2.9%
Other values (4)17372
11.8%
Uppercase Letter
ValueCountFrequency (%)
B8686
22.2%
E4343
11.1%
W4343
11.1%
R4343
11.1%
M4343
11.1%
Q4343
11.1%
S4343
11.1%
I4343
11.1%
Space Separator
ValueCountFrequency (%)
4343
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin186749
97.7%
Common4343
 
2.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n30401
16.3%
a21715
11.6%
t17372
 
9.3%
o13029
 
7.0%
e13029
 
7.0%
B8686
 
4.7%
r8686
 
4.7%
l8686
 
4.7%
s8686
 
4.7%
x4343
 
2.3%
Other values (12)52116
27.9%
Common
ValueCountFrequency (%)
4343
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII191092
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n30401
15.9%
a21715
 
11.4%
t17372
 
9.1%
o13029
 
6.8%
e13029
 
6.8%
B8686
 
4.5%
r8686
 
4.5%
l8686
 
4.5%
s8686
 
4.5%
x4343
 
2.3%
Other values (13)56459
29.5%

pickups
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct3406
Distinct (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean490.2159032
Minimum0
Maximum7883
Zeros5567
Zeros (%)19.1%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:43.708759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median54
Q3449
95-th percentile2840
Maximum7883
Range7883
Interquartile range (IQR)448

Descriptive statistics

Standard deviation995.6495355
Coefficient of variation (CV)2.031042912
Kurtosis9.26766556
Mean490.2159032
Median Absolute Deviation (MAD)54
Skewness2.976238116
Sum14265773
Variance991317.9975
MonotonicityNot monotonic
2021-08-14T13:30:43.910903image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05567
 
19.1%
12656
 
9.1%
21698
 
5.8%
3937
 
3.2%
4474
 
1.6%
5257
 
0.9%
6128
 
0.4%
3685
 
0.3%
4584
 
0.3%
3281
 
0.3%
Other values (3396)17134
58.9%
ValueCountFrequency (%)
05567
19.1%
12656
9.1%
21698
 
5.8%
3937
 
3.2%
4474
 
1.6%
5257
 
0.9%
6128
 
0.4%
777
 
0.3%
845
 
0.2%
946
 
0.2%
ValueCountFrequency (%)
78831
< 0.1%
78011
< 0.1%
77111
< 0.1%
75121
< 0.1%
72711
< 0.1%
72401
< 0.1%
71401
< 0.1%
71141
< 0.1%
70461
< 0.1%
69761
< 0.1%

spd
Real number (ℝ≥0)

ZEROS

Distinct114
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.98492418
Minimum0
Maximum21
Zeros3596
Zeros (%)12.4%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:44.137878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median6
Q38
95-th percentile13
Maximum21
Range21
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.699007242
Coefficient of variation (CV)0.6180541525
Kurtosis0.4192409725
Mean5.98492418
Median Absolute Deviation (MAD)2
Skewness0.4190693213
Sum174167.2786
Variance13.68265458
MonotonicityNot monotonic
2021-08-14T13:30:44.326121image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
53816
13.1%
03596
12.4%
63545
12.2%
33432
11.8%
73021
10.4%
82574
8.8%
91592
 
5.5%
101369
 
4.7%
11936
 
3.2%
13557
 
1.9%
Other values (104)4663
16.0%
ValueCountFrequency (%)
03596
12.4%
0.621
 
0.1%
0.7542
 
0.1%
166
 
0.2%
1.221
 
0.1%
1.2857142867
 
< 0.1%
1.5256
 
0.9%
1.67
 
< 0.1%
1.66666666735
 
0.1%
1.814
 
< 0.1%
ValueCountFrequency (%)
217
 
< 0.1%
2032
 
0.1%
1871
 
0.2%
17.57
 
< 0.1%
17120
0.4%
16.756
 
< 0.1%
16.57
 
< 0.1%
16.333333336
 
< 0.1%
16183
0.6%
15.7512
 
< 0.1%

vsb
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct179
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.818124897
Minimum0
Maximum10
Zeros6
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:44.543159image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.575
Q19.1
median10
Q310
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)0.9

Descriptive statistics

Standard deviation2.442897359
Coefficient of variation (CV)0.2770313856
Kurtosis2.898539633
Mean8.818124897
Median Absolute Deviation (MAD)0
Skewness-2.042058313
Sum256616.2526
Variance5.967747505
MonotonicityNot monotonic
2021-08-14T13:30:44.741329image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1021578
74.1%
9.11137
 
3.9%
8845
 
2.9%
7780
 
2.7%
6560
 
1.9%
4403
 
1.4%
5395
 
1.4%
3267
 
0.9%
0.3127
 
0.4%
3.5108
 
0.4%
Other values (169)2901
 
10.0%
ValueCountFrequency (%)
06
 
< 0.1%
0.3127
0.4%
0.33333333336
 
< 0.1%
0.36666666677
 
< 0.1%
0.420
 
0.1%
0.43333333337
 
< 0.1%
0.560
0.2%
0.620
 
0.1%
0.6514
 
< 0.1%
0.66666666677
 
< 0.1%
ValueCountFrequency (%)
1021578
74.1%
9.7757
 
< 0.1%
9.720
 
0.1%
9.5572
 
0.2%
9.33333333314
 
< 0.1%
9.11137
 
3.9%
8.73333333312
 
< 0.1%
8.5533
 
0.1%
8.57
 
< 0.1%
8.46
 
< 0.1%

temp
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct295
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.66904206
Minimum2
Maximum89
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:44.942479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile18
Q132
median46
Q364.5
95-th percentile79.5
Maximum89
Range87
Interquartile range (IQR)32.5

Descriptive statistics

Standard deviation19.81496901
Coefficient of variation (CV)0.4156779359
Kurtosis-1.037412126
Mean47.66904206
Median Absolute Deviation (MAD)16
Skewness0.05575251227
Sum1387216.793
Variance392.6329968
MonotonicityNot monotonic
2021-08-14T13:30:45.137681image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37675
 
2.3%
36595
 
2.0%
35548
 
1.9%
42531
 
1.8%
38529
 
1.8%
34514
 
1.8%
27508
 
1.7%
61502
 
1.7%
39494
 
1.7%
41494
 
1.7%
Other values (285)23711
81.5%
ValueCountFrequency (%)
220
 
0.1%
314
 
< 0.1%
485
0.3%
533
 
0.1%
641
0.1%
728
 
0.1%
853
0.2%
965
0.2%
1053
0.2%
1144
0.2%
ValueCountFrequency (%)
8928
 
0.1%
8856
 
0.2%
8754
 
0.2%
8661
 
0.2%
85144
0.5%
84212
0.7%
83159
0.5%
82289
1.0%
81.57
 
< 0.1%
81199
0.7%

dewp
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct305
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.82306491
Minimum-16
Maximum73
Zeros303
Zeros (%)1.0%
Negative1950
Negative (%)6.7%
Memory size227.5 KiB
2021-08-14T13:30:45.332364image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-16
5-th percentile-2
Q114
median30
Q350
95-th percentile64.66666667
Maximum73
Range89
Interquartile range (IQR)36

Descriptive statistics

Standard deviation21.28344434
Coefficient of variation (CV)0.6905038288
Kurtosis-1.035223571
Mean30.82306491
Median Absolute Deviation (MAD)18
Skewness0.0154181971
Sum896982.0119
Variance452.9850028
MonotonicityNot monotonic
2021-08-14T13:30:45.489809image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39578
 
2.0%
22525
 
1.8%
18518
 
1.8%
56508
 
1.7%
61492
 
1.7%
25486
 
1.7%
40473
 
1.6%
20465
 
1.6%
10460
 
1.6%
60450
 
1.5%
Other values (295)24146
83.0%
ValueCountFrequency (%)
-1634
 
0.1%
-1527
 
0.1%
-1386
0.3%
-1298
0.3%
-11121
0.4%
-1053
 
0.2%
-9100
0.3%
-892
0.3%
-7112
0.4%
-6155
0.5%
ValueCountFrequency (%)
737
 
< 0.1%
7214
 
< 0.1%
71.512
 
< 0.1%
71.257
 
< 0.1%
7170
0.2%
70.333333337
 
< 0.1%
70.285714296
 
< 0.1%
7062
0.2%
69.666666676
 
< 0.1%
69.333333336
 
< 0.1%

slp
Real number (ℝ≥0)

HIGH CORRELATION

Distinct413
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1017.817938
Minimum991.4
Maximum1043.4
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:45.867314image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum991.4
5-th percentile1005.3
Q11012.5
median1018.2
Q31022.9
95-th percentile1030
Maximum1043.4
Range52
Interquartile range (IQR)10.4

Descriptive statistics

Standard deviation7.76879558
Coefficient of variation (CV)0.007632794917
Kurtosis0.06914463865
Mean1017.817938
Median Absolute Deviation (MAD)5.2
Skewness0.05284461782
Sum29619519.8
Variance60.35418476
MonotonicityNot monotonic
2021-08-14T13:30:46.093792image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1020269
 
0.9%
1020.5255
 
0.9%
1019.9237
 
0.8%
1020.9227
 
0.8%
1021.1226
 
0.8%
1022.7214
 
0.7%
1020.2213
 
0.7%
1020.3209
 
0.7%
1021.2204
 
0.7%
1020.7204
 
0.7%
Other values (403)26843
92.2%
ValueCountFrequency (%)
991.47
 
< 0.1%
991.67
 
< 0.1%
992.37
 
< 0.1%
992.97
 
< 0.1%
993.47
 
< 0.1%
993.77
 
< 0.1%
994.17
 
< 0.1%
995.320
0.1%
996.27
 
< 0.1%
996.46
 
< 0.1%
ValueCountFrequency (%)
1043.47
< 0.1%
1043.37
< 0.1%
1043.27
< 0.1%
1043.17
< 0.1%
1042.96
< 0.1%
1042.27
< 0.1%
1041.77
< 0.1%
1041.67
< 0.1%
1041.27
< 0.1%
1041.114
< 0.1%

pcp01
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct80
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.003830149021
Minimum0
Maximum0.28
Zeros26468
Zeros (%)91.0%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:46.329672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.02
Maximum0.28
Range0.28
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.01893306515
Coefficient of variation (CV)4.943166713
Kurtosis87.81998828
Mean0.003830149021
Median Absolute Deviation (MAD)0
Skewness8.220954559
Sum111.4611667
Variance0.0003584609559
MonotonicityNot monotonic
2021-08-14T13:30:46.497052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
026468
91.0%
0.01439
 
1.5%
0.02181
 
0.6%
0.03147
 
0.5%
0.005141
 
0.5%
0.05115
 
0.4%
0.00333333333395
 
0.3%
0.0479
 
0.3%
0.01578
 
0.3%
0.0675
 
0.3%
Other values (70)1283
 
4.4%
ValueCountFrequency (%)
026468
91.0%
0.002540
 
0.1%
0.00333333333395
 
0.3%
0.005141
 
0.5%
0.00666666666762
 
0.2%
0.007533
 
0.1%
0.00813
 
< 0.1%
0.01439
 
1.5%
0.011666666677
 
< 0.1%
0.0126
 
< 0.1%
ValueCountFrequency (%)
0.2821
0.1%
0.26757
 
< 0.1%
0.267
 
< 0.1%
0.25333333337
 
< 0.1%
0.257
 
< 0.1%
0.24757
 
< 0.1%
0.1986
 
< 0.1%
0.196
 
< 0.1%
0.1713
< 0.1%
0.1527
0.1%

pcp06
Real number (ℝ≥0)

ZEROS

Distinct318
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02612874128
Minimum0
Maximum1.24
Zeros23460
Zeros (%)80.6%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:46.699237image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.1875
Maximum1.24
Range1.24
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.09312533965
Coefficient of variation (CV)3.564095899
Kurtosis47.35606139
Mean0.02612874128
Median Absolute Deviation (MAD)0
Skewness5.936438429
Sum760.3725
Variance0.008672328884
MonotonicityNot monotonic
2021-08-14T13:30:46.913290image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
023460
80.6%
0.01841
 
2.9%
0.02258
 
0.9%
0.03177
 
0.6%
0.05152
 
0.5%
0.005121
 
0.4%
0.04110
 
0.4%
0.0695
 
0.3%
0.0892
 
0.3%
0.00333333333387
 
0.3%
Other values (308)3708
 
12.7%
ValueCountFrequency (%)
023460
80.6%
0.002560
 
0.2%
0.00333333333387
 
0.3%
0.005121
 
0.4%
0.00666666666733
 
0.1%
0.007528
 
0.1%
0.0087
 
< 0.1%
0.0083333333337
 
< 0.1%
0.01841
 
2.9%
0.0133333333313
 
< 0.1%
ValueCountFrequency (%)
1.246
< 0.1%
1.226
< 0.1%
1.217
< 0.1%
1.0837
< 0.1%
1.0187
< 0.1%
1.01057
< 0.1%
0.8957
< 0.1%
0.8757
< 0.1%
0.867
< 0.1%
0.8357
< 0.1%

pcp24
Real number (ℝ≥0)

ZEROS

Distinct484
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09046437121
Minimum0
Maximum2.1
Zeros18631
Zeros (%)64.0%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:47.232114image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.05
95-th percentile0.5755
Maximum2.1
Range2.1
Interquartile range (IQR)0.05

Descriptive statistics

Standard deviation0.2194022017
Coefficient of variation (CV)2.425288528
Kurtosis16.22082135
Mean0.09046437121
Median Absolute Deviation (MAD)0
Skewness3.605783873
Sum2632.603667
Variance0.04813732611
MonotonicityNot monotonic
2021-08-14T13:30:47.543029image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
018631
64.0%
0.011259
 
4.3%
0.05367
 
1.3%
0.02306
 
1.1%
0.09271
 
0.9%
0.08333333333256
 
0.9%
0.06225
 
0.8%
0.08217
 
0.7%
0.1793333333152
 
0.5%
0.03149
 
0.5%
Other values (474)7268
 
25.0%
ValueCountFrequency (%)
018631
64.0%
0.002572
 
0.2%
0.00333333333391
 
0.3%
0.00587
 
0.3%
0.005833333333107
 
0.4%
0.00666666666719
 
0.1%
0.007521
 
0.1%
0.0087
 
< 0.1%
0.011259
 
4.3%
0.01552
 
0.2%
ValueCountFrequency (%)
2.113
 
< 0.1%
1.897
 
< 0.1%
1.50383333364
0.2%
1.4938333337
 
< 0.1%
1.497
 
< 0.1%
1.48883333321
 
0.1%
1.466
 
< 0.1%
1.426
 
< 0.1%
1.4188333337
 
< 0.1%
1.4138333337
 
< 0.1%

sd
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct421
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.529169244
Minimum0
Maximum19
Zeros20167
Zeros (%)69.3%
Negative0
Negative (%)0.0%
Memory size227.5 KiB
2021-08-14T13:30:47.843016image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32.958333333
95-th percentile12.16666667
Maximum19
Range19
Interquartile range (IQR)2.958333333

Descriptive statistics

Standard deviation4.520325424
Coefficient of variation (CV)1.787276765
Kurtosis1.313944097
Mean2.529169244
Median Absolute Deviation (MAD)0
Skewness1.589743978
Sum73601.35417
Variance20.43334194
MonotonicityNot monotonic
2021-08-14T13:30:48.184722image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
020167
69.3%
81934
 
6.6%
11362
 
1.2%
12345
 
1.2%
9334
 
1.1%
7182
 
0.6%
1181
 
0.6%
13180
 
0.6%
2175
 
0.6%
0.87540
 
0.1%
Other values (411)5201
 
17.9%
ValueCountFrequency (%)
020167
69.3%
0.0416666666719
 
0.1%
0.0416666666713
 
< 0.1%
0.0833333333313
 
< 0.1%
0.0833333333319
 
0.1%
0.12539
 
0.1%
0.166666666732
 
0.1%
0.208333333333
 
0.1%
0.2539
 
0.1%
0.291666666734
 
0.1%
ValueCountFrequency (%)
197
< 0.1%
18.958333337
< 0.1%
18.916666676
< 0.1%
18.8756
< 0.1%
18.833333337
< 0.1%
18.791666677
< 0.1%
18.7514
< 0.1%
18.708333336
< 0.1%
18.666666677
< 0.1%
18.6257
< 0.1%

hday
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size28.5 KiB
False
27980 
True
 
1121
ValueCountFrequency (%)
False27980
96.1%
True1121
 
3.9%
2021-08-14T13:30:48.320808image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Interactions

2021-08-14T13:30:24.585102image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:24.820027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:24.989394image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:25.171667image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:25.348043image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:25.517347image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:25.690179image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:25.863585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:26.054238image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:26.237601image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:26.440147image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:26.622920image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:26.785838image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:26.951224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:27.095250image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:27.229849image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:27.391765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:27.552274image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:27.727073image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:27.979824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:28.144211image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:28.324004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:28.474052image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:28.665712image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:28.823154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:28.968182image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:29.112721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:29.275618image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:29.461878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:29.614363image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:29.805064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:29.967971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:30.118461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:30.263488image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:30.407522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:30.562979image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:30.725890image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:30.875887image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.023893image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.168417image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.324904image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.487772image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.641788image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.805639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:31.946216image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:32.094749image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:32.230289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:32.389719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:32.571004image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:32.705110image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:32.858129image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:33.166010image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:33.344814image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:33.504741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:33.684533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:33.826579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:33.979556image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:34.163815image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:34.323248image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:34.490621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:34.669095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:34.861632image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:35.034967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:35.200857image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:35.351878image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:35.498358image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:35.658284image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:35.847577image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:36.008437image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:36.154027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:36.328342image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:36.520994image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:36.699298image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:36.883611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:37.052933image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:37.226329image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:37.400098image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:37.561082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:37.763722image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:37.920603image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:38.105367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:38.270761image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:38.428735image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:38.597558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:38.765447image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:38.938781image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:39.144897image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:39.345044image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:39.507954image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:39.788080image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:39.953961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:40.132781image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:40.316524image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:40.502775image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:40.677165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:40.849448image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:41.018316image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:41.176751image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:41.371446image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-14T13:30:41.530380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-08-14T13:30:48.437530image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-14T13:30:48.697779image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-14T13:30:48.966980image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-14T13:30:49.228255image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-08-14T13:30:49.574395image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-08-14T13:30:41.842787image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-14T13:30:42.261472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-08-14T13:30:42.465116image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

pickup_dtboroughpickupsspdvsbtempdewpslppcp01pcp06pcp24sdhday
02015-01-01 01:00:00Bronx1525.010.030.07.01023.50.00.00.00.0Y
12015-01-01 01:00:00Brooklyn15195.010.030.07.01023.50.00.00.00.0Y
22015-01-01 01:00:00EWR05.010.030.07.01023.50.00.00.00.0Y
32015-01-01 01:00:00Manhattan52585.010.030.07.01023.50.00.00.00.0Y
42015-01-01 01:00:00Queens4055.010.030.07.01023.50.00.00.00.0Y
52015-01-01 01:00:00Staten Island65.010.030.07.01023.50.00.00.00.0Y
62015-01-01 01:00:00NaN45.010.030.07.01023.50.00.00.00.0Y
72015-01-01 02:00:00Bronx1203.010.030.06.01023.00.00.00.00.0Y
82015-01-01 02:00:00Brooklyn12293.010.030.06.01023.00.00.00.00.0Y
92015-01-01 02:00:00EWR03.010.030.06.01023.00.00.00.00.0Y

Last rows

pickup_dtboroughpickupsspdvsbtempdewpslppcp01pcp06pcp24sdhday
290912015-06-30 22:00:00Manhattan44525.010.076.064.01011.90.00.00.00.0N
290922015-06-30 22:00:00Queens5565.010.076.064.01011.90.00.00.00.0N
290932015-06-30 22:00:00Staten Island25.010.076.064.01011.90.00.00.00.0N
290942015-06-30 23:00:00Bronx677.010.075.065.01011.80.00.00.00.0N
290952015-06-30 23:00:00Brooklyn9907.010.075.065.01011.80.00.00.00.0N
290962015-06-30 23:00:00EWR07.010.075.065.01011.80.00.00.00.0N
290972015-06-30 23:00:00Manhattan38287.010.075.065.01011.80.00.00.00.0N
290982015-06-30 23:00:00Queens5807.010.075.065.01011.80.00.00.00.0N
290992015-06-30 23:00:00Staten Island07.010.075.065.01011.80.00.00.00.0N
291002015-06-30 23:00:00NaN37.010.075.065.01011.80.00.00.00.0N